Oltre la ricerca di base: affrontare i limiti della similarità semantica

Oltre la similarità

Il "problema dell'80%"si verifica quando la ricerca semantica di base funziona per query semplici ma fallisce in casi estremi. Quando si cerca solo per similarità, il vettore spesso restituisce i frammenti più simili numericamente. Tuttavia, se questi frammenti sono quasi identici, l'LLM riceve informazioni ridondanti, sprecando la finestra di contesto limitata e perdendo prospettive più ampie.

Pilastri avanzati di recupero

Massima rilevanza marginale (MMR):Invece di selezionare semplicemente gli elementi più simili, MMR bilancia rilevanza e diversità per evitare ridondanze. $MMR = \text{argmax}_{d \in R \setminus S} [\lambda \cdot \text{sim}(d, q) - (1 - \lambda) \cdot \max_{s \in S} \text{sim}(d, s)]$
Auto-querying:Utilizza l'LLM per trasformare il linguaggio naturale in filtri strutturati per metadati (ad esempio, filtrare per "Lezione 3" o "Fonte: PDF").
Compressione contestuale:Riduce i documenti recuperati per estrarre solo gli estratti "ad alto contenuto nutritivo" rilevanti per la query, risparmiando token.

La trappola della ridondanza

Fornire all'LLM tre versioni dello stesso paragrafo non lo rende più intelligente—rende semplicemente il prompt più costoso. La diversità è fondamentale per un contesto "ad alto contenuto nutritivo".

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Knowledge Check

You want your system to answer "What did the instructor say about probability in the third lecture?" specifically. Which tool allows the LLM to automatically apply a filter for { "source": "lecture3.pdf" }?

ConversationBufferMemory

Self-Querying Retriever

Contextual Compression

MapReduce Chain

Challenge: The Token Limit Dilemma

Apply advanced retrieval strategies to solve a real-world constraint.

You are building a RAG system for a legal firm. The documents retrieved are 50 pages long, but only 2 sentences per page are actually relevant to the user's specific query. The standard "Stuff" chain is throwing an OutOfTokens error because the context window is overflowing with irrelevant text.

Step 1

Identify the core problem and select the appropriate advanced retrieval tool to solve it without losing specific nuances.

Problem: The context window limit is being exceeded by "low-nutrient" text surrounding the relevant facts.

Tool Selection:ContextualCompressionRetriever

Step 2

What specific component must you use in conjunction with this retriever to "squeeze" the documents?

Solution: Use an LLMChainExtractor as the base for your compressor. This will process the retrieved documents and extract only the snippets relevant to the query, passing a much smaller, highly concentrated context to the final prompt.